The estimation and use of predictions for the assessment of model performance using large samples with multiply imputed data
نویسندگان
چکیده
Multiple imputation can be used as a tool in the process of constructing prediction models in medical and epidemiological studies with missing covariate values. Such models can be used to make predictions for model performance assessment, but the task is made more complicated by the multiple imputation structure. We summarize various predictions constructed from covariates, including multiply imputed covariates, and either the set of imputation-specific prediction model coefficients or the pooled prediction model coefficients. We further describe approaches for using the predictions to assess model performance. We distinguish between ideal model performance and pragmatic model performance, where the former refers to the model's performance in an ideal clinical setting where all individuals have fully observed predictors and the latter refers to the model's performance in a real-world clinical setting where some individuals have missing predictors. The approaches are compared through an extensive simulation study based on the UK700 trial. We determine that measures of ideal model performance can be estimated within imputed datasets and subsequently pooled to give an overall measure of model performance. Alternative methods to evaluate pragmatic model performance are required and we propose constructing predictions either from a second set of covariate imputations which make no use of observed outcomes, or from a set of partial prediction models constructed for each potential observed pattern of covariate. Pragmatic model performance is generally lower than ideal model performance. We focus on model performance within the derivation data, but describe how to extend all the methods to a validation dataset.
منابع مشابه
Selection of Variables that Influence Drug Injection in Prison: Comparison of Methods with Multiple Imputed Data Sets
Background: Prisoners, compared to the general population, are at greater risk of infection. Drug injection is the main route of HIV transmission, in particular in Iran. What would be of interest is to determine variables that govern drug injection among prisoners. However, one of the issues that challenge model building is incomplete national data sets. In this paper, we addressed the process ...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملComparative assessment of the accuracy of maximum likelihood and correlated signal enhancement algorithm positioning methods in gamma camera with large square photomultiplier tubes
Introduction: The gamma cameras, based on scintillation crystal followed by an array of photomultiplier tubes (PMTs), play a crucial role in nuclear medicine. The use of square PMTs provides the minimum dead zones in the camera. The camera with square PMTs also reduces the number of PMTs relative to the detection area. Introduction of a positioning algorithm to improve the spat...
متن کاملEnhanced Predictions of Tides and Surges through Data Assimilation (TECHNICAL NOTE)
The regional waters in Singapore Strait are characterized by complex hydrodynamic phenomena as a result of the combined effect of three large water bodies viz. the South China Sea, the Andaman Sea, and the Java Sea. This leads to anomalies in water levels and generates residual currents. Numerical hydrodynamic models are generally used for predicting water levels in the ocean and seas. But thei...
متن کامل